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Abstract 

The LASSO is a widely used statistical methodology for simultaneous estimation 
and variable selection. In the last years, many authors analyzed this technique from 
a theoretical and applied point of view. We introduce and study the adaptive LASSO 
problem for discretely observed ergodic diffusion processes. We prove oracle properties 
also deriving the asymptotic distribution of the LASSO estimator. Our theoretical 
framework is based on the random field approach and it applied to more general fam- 
ilies of regular statistical experiments in the sense of Ibragimov-Hasminskii (1981). 
Furthermore, we perform a simulation and real data analysis to provide some evidence 
on the applicability of this method. 

Key words: discretely observed diffusion processes, model selection, oracle proper- 
ties, random fields, stochastic differential equations. 
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1 Introduction 



The least absolute shrinkage and selection operator (LASSO) is a useful and well studied 
approach to the problem of model selection and its major advantage is the simultaneous 
execution of both parameter estimation and variable selection (see Tibshirani, 1996; Knight 
and Fu, 2000, Efron et al, 2004). This is realized by the fact that the dimension of the 
parameter space does not change (while it does with the information criteria approach, e.g. 
in AIC, BIC, etc), because the LASSO method only sets some parameters to zero to eliminate 
them from the model. The LASSO method usually consists in the minimization of an L 2 
norm under L 1 norm constraints on the parameters. Thus it usually implies least squares 
or maximum likelihood approach plus constraints. The important property stating that the 
correct parameters are set to zero by LASSO method under the true data generating model, 
is called oracle property (Fan and Li, 2001). As shown by Zou (2006), since the classical 
LASSO estimator uses the same amount of shrinkage for each parameters, the resulting 
model selection could be inconsistent. To overcome this drawback, it is possible to consider 
an adaptive amount of shrinkage for each parameters (Zou, 2006). 

Originally, the LASSO procedure was introduced for linear regression problems, but, in 
the recent years, this approach has been applied to time series analysis by several authors 
mainly in the case of autoregressive models. For example, just to mention a few, Wang 
et al. (2007) consider the problem of shrinkage estimation of regressive and autoregressive 
coefficients, while Nardi and Rinaldo (2008) consider penalized order selection in an AR(p) 
model. The VAR case was considered in Hsu et al. (2007). Very recently Caner (2009) 
studied the LASSO method for general GMM estimator also in the case of time series and 
Knight (2008) extended the LASSO approach to nearly singular designs. 

In this paper we consider the LASSO approach for discretely observed diffusion processes. 
In this case, the likelihood function is not usually known in closed form, moreover most 
models used in application are not necessarily linear. In this paper, instead of working on a 
single approximation of the likelihood, we study the problem in terms of random fields (see 
Yoshida, 2005) which encompasses all widely used methods in the literature of inference for 
discretely sampled diffusion processes. Although we do not explicitly state the results in this 
form, the proofs in this paper, based on the properties of random fields, are immediately 
extensible to regular statistical experiments in the sense of Ibragimov-Hasmkinskii (1981), 
i.e. they apply to i.i.d. as well as regressive and autoregressive models. 

For diffusion processes, the LASSO method requires some additional care because the 
rate of convergence of the parameters in the drift and the diffusion coefficient are different. 
We point out that, the usual model selection strategy based on AIC (see Uchida and Yoshida, 
2005) usually depends on the properties of the estimators but also on the method used to 
approximate the likelihood. Indeed, AIC requires the calculation of the likelihood (see Iacus, 
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2008). On the contrary, the present LASSO approach depends solely on the properties of 
the estimator and so the problem of likelihood approximation is not particularly compelling. 

It is worth to mention that, model selection for continuous time diffusion processes was 
considered earlier in Uchida and Yoshida (2001) by means of information criteria. 

The paper is organized as follows. Section [2] introduced the model and the regularity 
assumptions and states the problem of LASSO estimation for discretely sampled diffusion 
processes. Section [3] proves consistency and oracle properties of the LASSO estimator. Sec- 
tion H] contains a Monte Carlo analysis and one application to real financial data. Proofs are 
collected in Section [5j Tables and figures at the end of the manuscript. 

2 The LASSO problem for diffusion models 

In the first part of this Section, we introduce the model on which makes inference and some 
basic notations. Let X t ,t > 0, be a d- dimensional diffusion process solution of the following 
stochastic differential equation 

dX t = b(a, X t )dt + a(/3, X t )dW t (2.1) 

where a = (cx 1 ,...,a p ) G P C W, p > 1, (3 = {fa,..., /3 q ) G Q q C M. q , q > 1, b : Q p xR d -> R d , 
o : <d q x R d — > M. d x M. d and W t is a standard Brownian motion in R d . We assume that the 
functions b and a are known up to the parameters a and (3. We denote by 9 = (a, (5) G 
O p x Q q = O the parametric vector and with #o — {&o, Po) its unknown true value. For a 
matrix A, we denote by A® 2 = AA' and by A' 1 the inverse of A. Let S(/3,x) = a((3, x)® 2 . 
The sample path of X t is observed only at n + 1 equidistant discrete times t«, such that 
U~ U-i — \i < oo for 1 < i < n (with t = and t n+ i = t). We denote by X n = {X t .} <i< n 
our random sample with values in R nxci . 

The asymptotic scheme adopted in this paper is the following: nA n — > oo, A n — > and 
nA^ — > as n — > oo. This asymptotic framework is called rapidly increasing design and the 
condition nA^ — > means that A n shrinks to zero slowly. We need some assumptions on 
the regularity of the process: 

A\. There exists a constant C such that 

\b(a ,x) - b(a ,y)\ + W{/3 ,x) - a(/3 ,y)\ < C\x - y\. 

A 2 - inf0 )Se det(£(/3,a;)) > 0. 

^4.3. The process X is ergodic for every 9 with invariant probability measure /ig. 
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A 4 - For all m > and for all 9, sup t E\X t \ m < oo. 

A$. For every 9, the coefficients b(a, x) and cr(f3, x) are five times differentiable with respect 
to x and the derivatives are bounded by a polynomial function in x, uniformly in 9. 

Aq. The coefficients b(a,x) and a(f3,x) and all their partial derivatives respect to x up to 
order 2 are three times differentiable with respect to 9 for all x in the state space. All 
derivatives with respect to 9 are bounded by a polynomial function in x, uniformly in 
9. 

A-j. If the coefficients b(a, x) = b(a , x) and cr(f3, x) = o"(/3 , x) for all x (//^-almost surely), 
then a = a and (3 = f3 . 

Hereafter, we assume that the conditions A\ — A-j hold. Let X(9) be the positive definite 
and invertible Fisher information matrix at 9 given by 

T(9) = ( Va = [ X b 3 ( a )]k,3=i,...,p \ 



where 



, i 1 <)b(,\.r) <)b(,\.r) 
Ib {a)= I ^(M^ ^T^ W ' 

, _ 9 /' 1 da(P,x)da(P,x) 



Moreover, we consider the matrix 



nA„ l P 

±1, 



= 

where l p and I g are respectively the indentity matrix of order p and q. 

In order to introduce the LASSO problem, we consider a random field H n : M. nxd xB->l 
admitting the first and second derivatives with respect to 9; we denote by M n (X n , 9) the 
vector of the first derivatives and by HI n (X n , 9) the Hessian matrix. Furthermore, we assume 
that the following conditions hold: 

B\. for each 9 G 0, we have that 

^^^(Xn, #VW 1/2 ^ (2-2) 
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B 2 - for each 9 e ©, let 6> n : IR nxd — >■ be a consistent estimator of 9 given by 

n = argminH n (X n , 0) 

6 

such that 

ip (n)- 1 / 2 (O n -0)AN(0,X(0)- 1 ) (2.3) 

An example of random field (contrast function) satisfying the assumptions B\ — B 2 is given 
by the quasi-likelihood function H n (X„, 9) = / n (X n , 9) obtained by means the Euler approx- 
imation (see Kessler, 1997, Yoshida, 2005), that is 

l n (X n , 0) = \Y, |logdet(S,_ 1 (/3)) + l-ZT^ffllAXi - AA-i(«)] 02 } (2.4) 

where AXj = X t . — X ti _^, £«(/?) = £(/3, AT t .) and bi(a) = b(a,X tj ). Then the unpenalized 
estimator 

9 n = argmin/ n (X„,6') 

6 

satisfies the assumption B 2 . For other examples, the reader can consult Bibby amd Sorensen, 
(1995), Kessler and Sorensen (1999), Nicolau (2002) and Ait-Sahalia (2008). 

The classical adaptive LASSO objective function, in this case, should be given by 

p q 

e n (x„, 9) + J2 KMj\ + ^2inM (2.5) 

j=l k=l 

where A n j and 7^ assume real positive values representing an adaptive amount of the 
shrinkage for each elements of a and (5. Nevertheless, following the same approach of Wang 
and Leng (2007), we observe that by means of a Taylor expansion of H n (X n , 9) at 9 n , one 
has immediately that 

H„(X n , 9) = H n (X„, 9 n ) + H„(X n , 9){9 - 9 n )' + \{9- 9 n )M n (X n , 9 n )(9 - 9 n )' + o p (l) 
= H„(X n , 9 n ) + \(0- n )M n (X n , 9 n )(9 - 9 n )' + o p (l) 
Therefore, we use the following objective function 

F{9) = {9- n )U n (X n ,0 n )(0 - 9 n )' + KiM + 5>«>^ fc l ( 2 - 6 ) 

3=1 k=\ 
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instead of (12. 5p . and the LASSO-type estimator 6 n : W ixd — > © is denned as 



n = («n,/3n) = arg min T(9) . 



(2.7) 



The function J r (6 l ) is a penalized quadratic form and it has the advantage to provide an 
unified theoretical framework. Indeed, the objective function (12.51) allows us to perform 
correctly the LASSO procedure only if H n is strictly convex and this fact restricts the choice 
of the possible contrast functions for the model (I2.ip . Then, the function (12.61) overcomes this 
criticality. We also point out that J-"{0) has two constraints, because the drift and diffusion 
parameters aj and are well separated with different rates of convergence. 

3 Oracle properties 

As observed by Fan and Li (2001), a good procedure should have the oracle properties, that 
is: 

• identifies the right subset model; 

• has the optimal estimation rate and converge to a Gaussian random variable N(0, S) 
where £ is the covariance matrix of the true subset model. 

The aim of this Section is to prove that LASSO-type estimator 9 n has a good behavior in 
the oracle sense. 

As shown by Zou (2006) the classical LASSO estimation cannot be as efficient as the 
oracle and the selection results could be inconsistent, whereas its adaptive version has the 
oracle properties. Without loss of generality, we assume that the true model, indicated by 
#o = («o, A)), has parameters a j and f3ok equal to zero for p < j < p and qo < k < q, while 
aoj 7^ and fiok ^ for 1 < j < p and 1 < k < q . To study the asymptotic properties of 
the LASSO-type estimator 9 n , we consider the following conditions: 

C\. J^- —> and ;% — > where /i n = max{A nj j, 1 < j < po} and u n = max^^, 1 < k < 



qo} 

C 2- T^fc ~* 00 and % ~* 00 wliere K « = min {^n,j,J > Po} and u n = min{7 njfc , k > q } 

The assumption C\ says us that the maximal tuning coefficient for the parameter aj and 
(3k, with 1 < j < po and 1 < k < qo, tends to zero faster than (nA n )~^ and respectively 
and then implies that y/nA n fi n — > 0, \fnv n — > 0. Analogously, we observe that C2 means 
that that the minimal tuning coefficient for the parameter aj and with j > p and k > q , 
tends to infinite faster than y/nAn and y/n. 
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Theorem 1. Under the conditions Bi, B2 andC\, one has that 

n #0 



For the sake of simplicity, we denote by 6* = (a*, ft*) the vector corresponding to the 
nonzero parameters, where a* = (a>i, ...,a po ) and j3* = fl go +i), while 9° = (a°,{3°)' 

is the vector corresponding to the zero parameters where a° = (a Po+ i, a p ) and (3° = 
(/3 qo+1 ,...,/3 g ). Therefore, 6 = (a ,/3 ) = (a* , a° , /3 *, /3 °) and 6 n = (a* n} a° n J n J°) . 

Theorem 2. Under the conditions B±, B2 andC2, we have that 

P(a° n = 0) 1 and P0° = 0) 1. (3.1) 



From Theorem [TJ we can conclude that the estimator 9 n is consistent. Furthemore, 
Theorem [2] says us that all the estimates of the zero parameters are correctly set equal to 
zero with probability tending to 1. In other words, the model selection procedure is consistent 
and the true subset model is correctly indentified with probability tending to 1. 

To complete our program, we derive the asymptotic distribution of Q* n . Hence, we indicate 
by X (#o) the (p + q ) x (p + q ) submatrix of X{6) at point #q, that is 

T - ( r « = ' ( a o)]k,j=i,..., Po 
and introduce the following rate of convergence matrix 



<Po(n) 



^1 

^ 



The next result establishes that the estimator 9* n is efficient as well as the oracle estimator. 
Theorem 3 (Oracle property). Under the conditions £>i, B2, C\ and C2, we have that 

Mn)-H0:-0* o )^N(0,l o \e* o )) (3.2) 
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Clearly, the theoretical and practical implications of our method rely to the specification 
of the tuning parameter X n j and 7^. As observed in Wang and Leng (2007), these values 
could be obtained by means of some model selection criteria like generalized cross-validation, 
Akaike information criteria or Bayes information criteria. Unfortunately, this solution is 
computationally heavy and then impracticable. Therefore, the tuning parameters should be 
chosen as is Zou (2006) in the following way 

\ nd = A |a„, i |~ <51 , 7 njfc = io\f3 n ,j\~ &2 (3.3) 

where a n j and f3 n ^ are the unpenalized estimator of otj and (3^ respectively, 8-1,82 > and 
usually taken unitary. The asymptotic results hold under the additional conditions 

\/nA n \ — > 0, (nA n ) 2 A — > 00, and V"7o — > 0, n 2 70 — > 00. 

4 Performance of the LASSO method for small sample 
size 

In this section we perform a small Monte Carlo analysis to check whether the LASSO method 
is able to select a specified model also in small samples. We also apply the method to a bench- 
mark data set often used in the literature of model selection. The asymptotic framework of 
this paper is not completely realized in the next two applications, but nevertheless we test 
what happens outside the theoretical framework. 

In both cases, we do not pretend to give extensive analysis of the method, because the 
previous theorems already prove the asymptotic validity of the LASSO approach for diffusion 
processes. Instead, we just want to show some evidence on simulated and real data to give 
the feeling of the applicability of the method. 

4.1 A simulation experiment 

We reproduce the experimental design in Uchida and Yoshida (2005). Therefore, we consider 
a diffusion process solution of the following stochastic differential equation 

dX t = -(X t - 10)dt + 2y/X~ t dW t , X = 10 . 
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We simulate 1000 trajectories of this process using the second Milstein scheme, i.e. the data 
are simulated according to 



X, 




) 



with Z ~ N(0, 1), b x and b xx (resp. a x and a xx ) are the first and second partial derivative 
in x of the drift (resp. diffusion) coefficients (see, Milstein, 1978). This scheme has weak 
second-order convergence and guarantees good numerical stability. Data are simulated at 
high frequency and resampled at lower frequency A n = 0.1 for a total of n = 1000 observa- 
tions. The simulations are done using the sde package (see Iacus, 2008) for the R statistical 
environment. So we estimate via LASSO the following five dimensional parametric model 



and the true model is (9 1 — 1, 9 2 — 10, #3 = 0, #4 = 4, 9 5 = 0.5). The LASSO estimator 
is obtained plugging in the objective function J 7 , the quasi-likelihood estimator and the 
Hessian matrix obtained by the function (12. 4p particularized for the present model X t . For 
the penalization term we use Aq = 70 = 1 in (13. 3p . 



Figure [U reports the density estimation of the estimates of the parameters 9i, i = 1, . . . , 5 
against their theoretical true value. These distributions are obtained using the estimates 
obtained from the 1000 Monte Carlo replications. Figure [1] indicates that all parameters are 
correctly estimated most of the times and, in particular, the parameter #3 is often estimated 
as zero. 

4.2 An example of use in the problem of identification of the term 
structure of interest rates 

In this section we reanalyze the U.S. Interest Rates monthly data from 06/1964 to 12/1989 
for a total of 307 observations. These data have been analyzed by many author including 
Nowman (1997), Ait-Sahalia (1996), Yu and Phillips (2001) just to mention a few references. 
We do not pretend to give the definitive answer on the subject, but just to analyze the 
effect of the model selection via the LASSO in a real application. The data used for this 
application were taken from the R package Ecdat by Croissant (2006). The different authors 



dX 



t — 



9 1 (X t -9 2 )dt + (9 3 + 9 4 X t pdW t 



Figure [T] about here 
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all try to fit a version of the so called CKLS model (from Chan et al, 1992) which is the 
solution X t of the following stochastic differential equation 

dX t = (a + px t )dt + aX?dW t . 

This model encompass several other models depending on the number of non-null parameters 
as Table [1] shows. This makes clear why the model selection on the CKLS model is quite 
appealing. 

Table [1] about here 

Our application of the LASSO method is reported in Table [2] along with the results from Yu 
and Phillips (2001) just for comparison. 

Table [2] about here 

Although we have proven that asymptotically the LASSO provides consistent estimates 
with the oracle properties, for finite sample size this is not always the case as mentioned 
by several authors. In this application, we estimate the parameters using quasi-likelihood 
method (QMLE in the table) in the first stage, then set the penalties as in (13. 3p and run 
the LASSO optimization. We estimate the CKLS parameters via the LASSO using mild 
penalties (i.e. Ao = 7o = 1 in (13. 3p ) and strong penalties (i.e. Ao = 7o = 10). Very strong 
penalties suggest that the model does not contain the term (3 and in both cases, the LASSO 
estimation suggest 7 = 3/2, therefore a model quite close to Cox, Ingersoll and Ross (1980). 
Being a shrinkage estimator, the LASSO estimates have very low standard error compared 
to the other cases. As said, this application has been done to show the applicability of the 
LASSO method and we do not pretend to draw in depth conclusions from this empirical 
evidence which is out of our competence. 



5 Proofs 

Proof of Theorem^ Following Fan and Li (2001), the existence of a consistent local mini- 
mizer is implied by that fact that for an arbitrarily small e > 0, there exists a sufficiently 
large constant C, such that 

lim P { inf JY0 O + v(n) 1,2 z) > F{0 Q ) 1 > 1 - e, (5.1) 

n->oo [z£RP+i:\z\=C J 



10 



with z = (u,v) = (ui, ...,u p ,Vi, ...,v q ). After some calculations, we obtain that 
r{6^ V {n) l / 2 z)-F{e ) 

= ^(n) 1 / 2 H n (X n , 9 n ) V (n) 1/2 z' + 2z^(ri) 1 / 2 H n (X n , 9 n )ip(n) 1/2 V (ny 1/2 (9 - 9 n )' 



+nA n ^ -V. 



0=1 



y/nA r 



^ A„,j |ar 0i | J +n j ^7n,A 
3=1 



,fc=i 



Aofc + 



3=1 



Ok 



^(n) 1/2 H n (X n , 6 n )v{n) 1/2 z' + 2^(n) 1 / 2 H n (X n , # n V W V V W~ 1/2 (#o - 



+nA n ^2 -V. 



0=1 



U ; 



\fnA~n 



P(> 



7n,fc 



3=1 



vfc=l 



Pofc H f= 



go 



3=1 



> ^>(n) 1/2 H n (X n , 9 n )cp(n) 1/2 z' + 2^(n) 1 / 2 H n (X n , 0„Mn) 1/! V(rO~ 1/2 (0a - 0*)' 



PO / 

+nA n ^2 K,j ( 
3=1 ^ 



a j + 



■it ; 



\fnA r 



' fe=i 



7n,fc I A 



> ^(n) 1/2 H n (X n , 9 n Mn) 1/2 z' + 2^(n) 1/2 H n (X n , 9 n )^{n) l/ \{n)- 1 ' 2 {9 - h)' 



- Po(-\/«A n /X n )|M| + <7o(V™V)M 
= Ci + ^2 — "3 

Now, it is clear that from the condition Ci, one has that H 3 = o p (l). Furthermore, being 
I jsr j = C, Hi is uniformly larger than x m j n (<^(ri) 1 / 2 EI rt (X n , 6 , ri )<^(ri) 1//2 )C 2 and 

r mm (^H 1/2 H n (X n ,^ n )^H 1 /2) C '2 4 C<2 w (Z(£o)) 

where r m j n (A) is the minum eigenvalue of A. We observe that 

\ V {nf' 2 W n ^A)^{nf l MnY l,2 {9 G - k)\ = O p (l) 

and then H2 is bounded and linearly dependent on C. Therefore, for C sufficiently large, 
F(9q + tp(n) l l 2 z) — ^(Oq) dominates Hi + H2 with arbitrarily large probability. This implies 
(15. ip and the proof is completed by noticing that F(9) is striclty convex which implies that 
the local minimum is the global one. □ 



11 



Proof of Theorem^ For j = p + 1, ...,p 



1 dT{9) 



y/nA n daj 



where Hn is the j-th row of H n . The first term of the previous expression is O p (l), while 
> Jwk~ ~ * 00 ' Since Theorem 1, 6 n is a minimizer of J 7 , then necessarely, P(6i n j = 



9 — n 



0) — >• 1 (see Proof of Theorem 2, Wang and Leng, 2007). Similarly for the estimators of the 
coefficients (3k, k = q + 1, q, we have that 



1 &F(0) 



n d(5k 



2-H«(X n , e n )V^0 n - On)' + ^sgn(/3 nj ; 
n \ n 



and by means the same arguments we get that P{fi n j. — 0) — > 1. 



□ 



Proof of Theorem^ Before starting the proof, it is necessary to introduce the following 
notations. Let 

• T** be the p x p matrix with elements [EI n ] fc j, k, j = 1, ...,po, 

• T*° be the po x p — p Q matrix with elements [ELjfej, k = 1, ...,po, j = Po + 1, ■■•,£>, 

• r°° be the (p — p ) x {jp — p Q ) matrix with elements [H n ] fc j, k, j = p + 1, ...,p, 



r^* be the po x p Q matrix with elements 



"■n\kj 1 



■ Qo, 



F*g be the g x q — q matrix with elements [EI n ] fc j, k — 1, go? j — Qo + 1 ; g ; 
be the (g — go) x (g — go) matrix with elements [H n ]fcj, k, j = go + 1, g, 



where 



■p** "p*o 
■p*o poo 



4r, 



"p** "p*o 
■p*o poo 



with 



r;* = [Xl j (a*)] k}j , where fc, j = 1, . . .,p , 
T* a ° = [l^(a* )] kJ , where k = 1, . . . ,p ;j = p + 1, . . . ,p, 



poo 



P^OoOkj, where k ,j = Po + 1, • • • ,P, 
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and 



with 



1 


J 


p*o 

J 






p*o 


n 


p*o 

. p 


poo 


"p*o 

L 1 /3 


"poo 
i /3 J 


where k,j — 


1,.. 


.,9o, 







p*o 

1 /3 

poo 
X /3 



[X^(/3^)] fe>i , where fc = 1, . . . , q ;j = g + 1, • • • , q, 
[X^(/3o)]fcj, where k,j = q + l,...,q. 



From Theorem [2] follows that the estimator 9 n globally minimizes of the following objec- 
tive function 



Ml 



n,j I a j I 



MO) = (a*-<)f;*(a*-a;y-2(«*-a;)f; o (a:y + ^f°°(<y + ^A 

+(p* - p* n )f*;(p* - fry - 2(/r - fa)fj (fay + far; (fa)' + J>^IA 



fe=i 



Hence, the following normal equations hold 

i aF o (0) 



o 



2 da* 
1 aF o (0) 



2 9/3* 



a*=a* 



r** / * ~ * \/ -n*o / ~o\f i 4 / a * \ 
a («n - "n) - T a («n) + %J 



f;*(^-^y-n°(^y + J B(/9* 



(5.2) 



(5.3) 



where A(a* n ) and B(fa) are respectively p an d Qo vectors with j-th and fc-th component 
given by |A nj sgn(d* ■) and |7 n ^sgn(/9* J -). From (I5.2p . by simple calculations, we have that 



\/nA n (a* n 



being VraA n A(a*) 
we obtain that 



y/nK^(a* n - « ) + (O^CV^^ + Op(l) 

o p (l) by condition C\. Furthermore, by inverting the block matrix T Q , 



(rr) -i 



(r-)- 1 + (r^^r**)-^* ^)- 1 
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where (r;*)- 1 = (T* a * - T* a ° (O^r* )- 1 and then 

/■p**\— l"p*0 /-p*o \ — 1-poo 

\ a ) a \ a ) a " 

By condition £>2 and the properties of the conditional multivariate Gaussian distribution, we 
derive that 

^A n (&* n - «*) A n(o, (tit 1 - (KTK (KT 1 ) 

and 

Thus -y/nA^(d* — «q) converges to N(0, (r**) _1 ). Similarly, from (15. 3p we obtain that 

v^w: - p* ) = v^w: - P* ) + (r;)- 1 ^;^: + o p (i) 

with y/nB0*) = o p (l). Therefore, y/ri0* - fa) converges to N(0, (I^*)" 1 ). This concludes 
the proof. □ 
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Figure 1: Density estimation of the LASSO-type estimates of the parameters of the process 
dX t = -0i (X t - 2 )dt + (0 3 + 4 X t ) e5 dW t over 1000 Monte Carlo replications. True values 
(0i = 1, 2 = 10, 3 = 0, 4 = 4, 5 = 0.5) represented as vertical dotted lines. 
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Reference Model a (3 7 

Merton (1973) dX t = adt + adW t 0~ 

Vasicek (1977) dX t = (a + /3X t )dt + adW t 

Cox, Ingersoll and Ross (1985) dX t = (a + pXjdt + aJX t dW t 1/2 

Dothan (1978) dX t = aX t dW t 1 

Geometric Brownian Motion dX t = (5X t dt + aX t dW t 1 

Brennan and Schwartz (1980) dX t = (a + (3X t )dt + aX t dW t 1 

Cox, Ingersoll and Ross (1980) dX t = aXf 2 dW t 3/2 

Constant Elasticity Variance dX t = (3X t dt + aX^dW t 

CKLS (1992) dX t = (a + (3X t )dt + aX?dW t 



Table 1: The family of one-factor short term interest rates models seen as special cases of 
the general CKLS model. 



Model 


Estimation Method 


a 




a 


7 


Vasicek 


MLE 


4.1889 


-0.6072 


0.8096 




CKLS 


Nowman 


2.4272 


-0.3277 


0.1741 


1.3610 


CKLS 


Exact Gaussian 


2.0069 
(0.5216) 


-0.3330 
(0.0677) 


0.1741 


1.3610 


CKLS 


QMLE 


2.0822 
(0.9635) 


-0.2756 
(0.1895) 


0.1322 
(0.0253) 


1.4392 
(0.1018) 


CKLS 


QMLE + LASSO 
with mild penalization 


1.5435 
(0.6813) 


-0.1687 
(0.1340) 


0.1306 
(0.0179) 


1.4452 
(0.0720) 


CKLS 


QMLE + LASSO 

with strong penalization 


0.5412 
(0.2076) 


0.0001 
(0.0054) 


0.1178 
(0.0179) 


1.4944 
(0.0720) 



Table 2: Model selection on the CKLS model for the U.S. interest rates data. Table taken 
from Yu and Phillips (2001) and updated with LASSO results. Standard errors in parenthesis 
when available. 
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